Use of multiple alignments in protein secondary structure prediction
نویسندگان
چکیده
l/sing a new database of 20 proteins not included in any of the previously used training datasets, we have incorporated multiple alignment information from homologous proteins into two well-characterized prediction methods: COMBINE (a jury method) and the Q-L (or quadratic-logistic) method. It is found that the increase in accuracy from the use of related proteins is similar for both methods (5.870 and 6.370, respectively) yielding a per residue prediction accuracy (Q3) of 68.7% and 69.0%, respectively, for a three state prediction. Most of the improvement came from consideration of averaging, profiling or consensus predictions. Of this improvement, a small amount (0.5%) came from recognition that ‘gappermissive” positions in the alignment are most frequently in the coil state. Our finding is consistent with the hypothesis qf a common secondary structure for the aligned family, and that improved accuracy is due to reduced noise in the prediction.
منابع مشابه
In unison: regularization of protein secondary structure predictions that makes use of multiple sequence alignments.
We present a method whose purpose is to post-process the fuzzy results of secondary structure prediction methods that use multiple sequence alignments, in order to obtain 'realistic' secondary structures, i.e., secondary structure elements whose length is greater than or equal to some predefined minimum length. This regularization helps with interpretation of the secondary structure prediction.
متن کاملComputational methods for protein secondary structure prediction using multiple sequence alignments.
Efforts to use computers in predicting the secondary structure of proteins based only on primary structure information started over a quarter century ago [1-3]. Although the results were encouraging initially, the accuracy of the pioneering methods generally did not attain the level required for using predictions of secondary structures reliably in modelling the three-dimensional topology of pr...
متن کاملAnalysis of the Effects of Multiple Sequence Alignments in Protein Secondary Structure Prediction
Secondary structure prediction methods are widely used bioinformatics algorithms providing initial insights about protein structure from sequence information. Significant efforts to improve the prediction accuracy over the past years were made, specially the incorporation of information from multiple sequence alignments. This motivated the search for the factors contributing for this improvemen...
متن کاملProtein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches
DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...
متن کاملStatAlign 2.0: combining statistical alignment with RNA secondary structure prediction
MOTIVATION Comparative modeling of RNA is known to be important for making accurate secondary structure predictions. RNA structure prediction tools such as PPfold or RNAalifold use an aligned set of sequences in predictions. Obtaining a multiple alignment from a set of sequences is quite a challenging problem itself, and the quality of the alignment can affect the quality of a prediction. By im...
متن کاملImproving Predicition of Protein Secondary Structure Using Structured Neural Networks and Multiple Sequence Alignments
The prediction of protein secondary structure by use of carefully structured neural networks and multiple sequence alignments has been investigated. Separate networks are used for predicting the three secondary structures alpha-helix, beta-strand, and coil. The networks are designed using a priori knowledge of amino acid properties with respect to the secondary structure and the characteristic ...
متن کامل